Search CORE

259 research outputs found

Efficient Joinable Table Discovery in Data Lakes: A High-Dimensional Similarity-Based Approach

Author: Dong Yuyang
Oyamada Masafumi
Takeoka Kunihiro
Xiao Chuan
Publication venue
Publication date: 29/03/2021
Field of study

Finding joinable tables in data lakes is key procedure in many applications such as data integration, data augmentation, data analysis, and data market. Traditional approaches that find equi-joinable tables are unable to deal with misspellings and different formats, nor do they capture any semantic joins. In this paper, we propose PEXESO, a framework for joinable table discovery in data lakes. We embed textual values as high-dimensional vectors and join columns under similarity predicates on high-dimensional vectors, hence to address the limitations of equi-join approaches and identify more meaningful results. To efficiently find joinable tables with similarity, we propose a block-and-verify method that utilizes pivot-based filtering. A partitioning technique is developed to cope with the case when the data lake is large and the index cannot fit in main memory. An experimental evaluation on real datasets shows that our solution identifies substantially more tables than equi-joins and outperforms other similarity-based options, and the join results are useful in data enrichment for machine learning tasks. The experiments also demonstrate the efficiency of the proposed method.Comment: Full version of paper in ICDE 202

arXiv.org e-Print Archive

DeepJoin: Joinable Table Discovery with Pre-trained Language Models

Author: Dong Yuyang
Enomoto Masafumi
Nozawa Takuma
Oyamada Masafumi
Xiao Chuan
Publication venue
Publication date: 23/06/2023
Field of study

Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery has become an important operation in data lake management. Existing approaches target equi-joins, the most common way of combining tables for creating a unified view, or semantic joins, which tolerate misspellings and different formats to deliver more join results. They are either exact solutions whose running time is linear in the sizes of query column and target table repository or approximate solutions lacking precision. In this paper, we propose Deepjoin, a deep learning model for accurate and efficient joinable table discovery. Our solution is an embedding-based retrieval, which employs a pre-trained language model (PLM) and is designed as one framework serving both equi- and semantic joins. We propose a set of contextualization options to transform column contents to a text sequence. The PLM reads the sequence and is fine-tuned to embed columns to vectors such that columns are expected to be joinable if they are close to each other in the vector space. Since the output of the PLM is fixed in length, the subsequent search procedure becomes independent of the column size. With a state-of-the-art approximate nearest neighbor search algorithm, the search time is logarithmic in the repository size. To train the model, we devise the techniques for preparing training data as well as data augmentation. The experiments on real datasets demonstrate that by training on a small subset of a corpus, Deepjoin generalizes to large datasets and its precision consistently outperforms other approximate solutions'. Deepjoin is even more accurate than an exact solution to semantic joins when evaluated with labels from experts. Moreover, when equipped with a GPU, Deepjoin is up to two orders of magnitude faster than existing solutions

arXiv.org e-Print Archive

Self-paced Weight Consolidation for Continual Learning

Author: Cong Wei
Cong Yang
Dong Jiahua
Liu Yuyang
Sun Gan
Publication venue
Publication date: 20/07/2023
Field of study

Continual learning algorithms which keep the parameters of new tasks close to that of previous tasks, are popular in preventing catastrophic forgetting in sequential task learning settings. However, 1) the performance for the new continual learner will be degraded without distinguishing the contributions of previously learned tasks; 2) the computational cost will be greatly increased with the number of tasks, since most existing algorithms need to regularize all previous tasks when learning new tasks. To address the above challenges, we propose a self-paced Weight Consolidation (spWC) framework to attain robust continual learning via evaluating the discriminative contributions of previous tasks. To be specific, we develop a self-paced regularization to reflect the priorities of past tasks via measuring difficulty based on key performance indicator (i.e., accuracy). When encountering a new task, all previous tasks are sorted from "difficult" to "easy" based on the priorities. Then the parameters of the new continual learner will be learned via selectively maintaining the knowledge amongst more difficult past tasks, which could well overcome catastrophic forgetting with less computational cost. We adopt an alternative convex search to iteratively update the model parameters and priority weights in the bi-convex formulation. The proposed spWC framework is plug-and-play, which is applicable to most continual learning algorithms (e.g., EWC, MAS and RCIL) in different directions (e.g., classification and segmentation). Experimental results on several public benchmark datasets demonstrate that our proposed framework can effectively improve performance when compared with other popular continual learning algorithms

arXiv.org e-Print Archive

多次元データに対するランキング問合せ処理に関する研究

Author: YUYANG DONG
董于洋
Publication venue
Publication date: 01/01/2019
Field of study

筑波大学 (University of Tsukuba)201

Tsukuba Repository

多次元データに対するランキング問合せ処理に関する研究

Author: YUYANG DONG
董于洋
Publication venue
Publication date: 01/01/2019
Field of study

筑波大学 (University of Tsukuba)201

Tsukuba Repository

Enhancement of Cement Paste with Carboxylated Carbon Nanotubes and Poly(Vinyl Alcohol)

Author: Dong Biqin
Hou Dongshuai
Ma Hongyan
Qiao Gang
Zhang Jinrui
Zhao Yuyang
Publication venue: Scholars\u27 Mine
Publication date: 27/05/2022
Field of study

Cement has been a major consumable material for construction in the world since its invention, but its low flexural strength is the main defect affecting the service life of structures. To adapt cement-based materials to a more stringent environment, carboxylated carbon nanotubes (CNTs-COOH) and poly(vinyl alcohol) (PVA) are proposed to enhance the mechanical properties of cement paste. This study systematically verifies the synergistic effect of CNTs-COOH/PVA on the performance of cement paste. First, UV-Vis spectroscopy and FTIR spectroscopy prove that CNTs-COOH can provide attachment sites for PVA and PVA can improve the dispersion and stability of CNTs-COOH in water, which demonstrates the feasibility of synergistically enhancing cement paste. When a 0.015% CNTs-COOH suspension with 0.1% PVA is added, the flexural strength of the cement paste increases by 73, 32, and 42% compared with control specimens at curing ages of 3, 7, and 28 days, respectively. The strength enhancement mechanism is revealed from the aspects of cement matrix enhancement and interface enhancement. Thermogravimetric (TG) analysis and mercury intrusion porosimetry (MIP) prove that CNTs-COOH can enhance the hydration degree of the cement matrix and fill the pores introduced by PVA. Based on the fact that PVA can improve the dispersibility and the nucleation site effect of CNTs-COOH in cement paste, molecular dynamics simulation confirms that PVA can bridge CNTs-COOH and C-S-H to enhance the interfacial bonding by 64.1%

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Gait Cycle-Inspired Learning Strategy for Continuous Prediction of Knee Joint Trajectory from sEMG

Author: Chen Yifeng
Dong Mingjie
Fu Xueming
Liu Haowen
Liu Luyan
Wei Dong
Xiong Wenxuan
Zhang Mingming
Zhang Yuyang
Zheng Hao
Zheng Yefeng
Zhong Wenjuan
Publication venue
Publication date: 24/07/2023
Field of study

Predicting lower limb motion intent is vital for controlling exoskeleton robots and prosthetic limbs. Surface electromyography (sEMG) attracts increasing attention in recent years as it enables ahead-of-time prediction of motion intentions before actual movement. However, the estimation performance of human joint trajectory remains a challenging problem due to the inter- and intra-subject variations. The former is related to physiological differences (such as height and weight) and preferred walking patterns of individuals, while the latter is mainly caused by irregular and gait-irrelevant muscle activity. This paper proposes a model integrating two gait cycle-inspired learning strategies to mitigate the challenge for predicting human knee joint trajectory. The first strategy is to decouple knee joint angles into motion patterns and amplitudes former exhibit low variability while latter show high variability among individuals. By learning through separate network entities, the model manages to capture both the common and personalized gait features. In the second, muscle principal activation masks are extracted from gait cycles in a prolonged walk. These masks are used to filter out components unrelated to walking from raw sEMG and provide auxiliary guidance to capture more gait-related features. Experimental results indicate that our model could predict knee angles with the average root mean square error (RMSE) of 3.03(0.49) degrees and 50ms ahead of time. To our knowledge this is the best performance in relevant literatures that has been reported, with reduced RMSE by at least 9.5%

arXiv.org e-Print Archive

Does temporary transfer to preoperative hemodialysis influence postoperative outcomes in patients on peritoneal dialysis? A retrospective cohort study

Author: Jie Dong
Pengyuan Wang
Qingqing Zhou
Yuyang Zhang
Zeyang Chen
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2023
Field of study

BackgroundThe associations between preoperative transfer to hemodialysis (HD) and postoperative outcomes in patients on chronic peritoneal dialysis (PD) remain unknown. We conducted this retrospective cohort study to investigate whether preoperative HD could influence surgical outcomes in PD patients undergoing major surgeries.MethodsAll chronic PD patients who underwent major surgeries from January 1, 2007, to December 31, 2020, at Peking University First Hospital were screened. Major surgery was defined as surgical procedures under general, lumbar or epidural anesthesia, with more than an overnight hospital stay. Patients under the age of 18, with a dialysis duration of less than 3 months, and those who underwent renal implantation surgeries and procedures exclusively aimed at placing or removing PD catheters were excluded. Patients involved were divided into either HD or PD group based on their preoperative dialysis status for further analysis.ResultsOf 105 PD patients enrolled, 65 continued PD, and 40 switched to HD preoperatively. Patients with preoperative HD were significantly more likely to develop postoperative hyperkalemia. The total complication rates were numerically higher in patients undergoing preoperative HD. After adjustment, the incidence of postoperative hyperkalemia or any other postoperative complication rates were similar between groups. There were no differences in long-term survival between the two groups.ConclusionsIt does not seem indispensable for PD patients to switch to temporary HD before major surgeries

Directory of Open Access Journals